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Abstract — Mathematical programming is a brancli of applied 
mathematics and has recently been used to derive new de- 
coding approaches, challenging established but often heuristic 
algorithms based on iterative message passing. Concepts from 
mathematical programming used in the context of decoding 
include linear, integer, and nonlinear programming, network 
flows, notions of duality as well as matroid and polyhedral theory. 
This survey article reviews and categorizes decoding methods 
based on mathematical programming approaches for binary 
linear codes over binary-input memoryless symmetric channels. 

Index Terms — Integer programming, LP decoding. Mathema- 
tical programming, ML decoding, Polyhedral theory. 



I. Introduction 

Based on an integer programming (IPIj formulation of 
the maximum likelihood decoding (MLD) problem for bi- 
nary linear codes, linear programming decoding (LPD) was 
introduced by Feldman et al. [1\, |2|. Since then, LPD has 
been intensively studied in a variety of articles especially 
dealing with low-density parity-check (LDPC) codes. LDPC 
codes are generally decoded by heuristic approaches called 
iterative message passing decoding (IMPD) subsuming sum- 
product algorithm decoding (SPAD) Q, ID and min-sum 
algorithm decoding (MS AD) f5l|. In these algorithms, prob- 
abilistic information is iteratively exchanged and updated 
between component decoders. Initial messages are derived 
from the channel output. IMPD exploits the sparse structure of 
parity-check matrices of LDPC and turbo codes very well and 
achieves good performance. However, IMPD approaches are 
neither guaranteed to converge nor do they have the maximum 
likelihood (ML) certificate property, i.e., if the output is a 
codeword, it is not necessarily the ML codeword. Furthermore, 
performance of IMPD is poor for arbitrary linear block codes 
with a dense parity-check matrix. In contrast, LPD offers 
some advantages and thus has become an important alternative 
decoding technique. First, this approach is derived from the 
discipline of mathematical programming which provides ana- 
lytical statements on convergence, complexity, and correctness 
of decoding algorithms. Second, LPD is not limited to sparse 
matrices. 

This article is organized as follows. In Section [III notation 
is fixed and well-known but relevant results from coding 
theory and polyhedral theory are recalled. Complexity and 
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polyhedral properties of MLD are discussed in Section |lll] 
In Section |IV] a general description of LPD is given. Several 
linear programming (LP) formulations dedicated to codes with 
low-density parity-check matrices, codes with high-density 
parity-check matrices, and turbo-like codes are categorized 
and their commonalities and differences are emphasized in 
Section |V] Based on these LP formulations, different streams 
of research on LPD have evolved. Methods focusing on 
efficient realization of LPD are summarized in Section [Vll 
while approaches improving the error-correcting performance 
of LPD at the cost of increased complexity are reviewed 
in Section IVIII Some concluding comments are made in 

Section [ym] 

II. Basics and Notation 

This section briefly introduces a number of definitions and 
results from linear coding theory and polyhedral theory which 
are most fundamental for the subsequent text. 

A binary linear block code C with cardinality 2*^ and block 
length n is a fc-dimensional subspace of the vector space 
{0, 1}" defined over the binary field F2. C C {0, 1}" is given 
by k basis vectors of length n which are arranged in a fc x n 
matrix G, called the generator matrix of the code Cg 

The orthogonal subspace C^ of C is defined as 

C^ = < y e {0, 1}" : ^ XjVj = (mod 2) for all x G C 

and has dimension n — k. It can also be interpreted as a binary 
Unear code of dimension n — k which is referred to as the dual 
code of C. A matrix H G {0, l}™><" whose m > n — k rows 
form a spanning set of C^ is called a parity-check matrix 
of C. It follows from this definition that C is the null space 
of H and thus a vector x £ {0, 1}" is contained in C if 
and only if Hx = (mod 2). Normally, m ~ n — k and 
the rows of H e {0, l}^""'')^" constitute a basis of C^. It 
should be pointed out, however, that most LPD approaches 
(see Section IVIIb benefit from parity-check matrices being 
extended by redundant rows. Moreover, additional rows of H 
never degrade the error-correcting performance of LPD. This 
is a major difference to IMPD which is generally weakened 
by redundant parity checks, since they introduce cycles to the 
Tanner graph. 

^Note that single vectors in this paper are generally column vectors; 
however, in coding theory they are often used as rows of matrices. The 
transposition of column vector a makes it a row vector, denoted by a -^ . 



Let X, x' G {0,1}". The Hamming distance between x 
and x' is the number of entries (bits) with different val- 
ues, i.e., d{x,x') — \{l < j < n : Xj ^ 2;'}|. The minimum 
(Hamming) distance of a code, d{C), is given by d{C) — 
Tnm{d{x,x') : x,x' G C, x ^ x'}. The Hamming weight of 
a codeword a; G C is defined as w{x) = d{x, 0), i.e., the 
number of ones in x. The minimum Hamming weight of C is 
w(C) = nim{w{x) : x £ C,x ^ 0}. For binary Hnear codes it 
holds that d{C) — w{C). The error-correcting performance of 
a code is, at least at high signal-to-noise ratio (SNR), closely 
related to its minimum distance. 

Let A G M™^" denote an mxn matrix and / = {!,..., m}, 
J = {l,...,n} be the row and column index sets of A, 
respectively. The entry in row i E I and column j/ G J of 
A is given by Ai,j. The i* row and j* column of A are 
denoted by A,., and A,j, respectively. A vector e G M™ is 
called the i* unit column vector if e^ = 1, i G /, and e/j = 
for all /i G / \ {«}. 

A parity-check matrix H can be represented by a bipartite 
graph G = (V, E), called its Tanner graph. The vertex set V of 
G consists of the two disjoint node sets / and J. The nodes in 
/ are referred to as check nodes and correspond to the rows of 
H whereas the nodes in J are referred to as variable nodes and 
correspond to columns of H. An edge [i,j] G E connects node 
i and j if and only if Hij ~ 1. Let Ni = {j G J ; Hij = 1} 
denote the index set of variables incident to check node i, 
and analogously Nj = {i E I : Hij — 1} for j G J. The 
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Fig. 1. Parity-check matrix and Tanner graph of an (8,4) code. 

degree of a check node i is the number of edges incident to 
node i in the Tanner graph or, equivalently, dc{i) = \Ni\. The 
maximum check node degree d™^'' is the degree of the check 
node i E I with the largest number of incident edges. The 
degree of a variable node j, dv{j), and the maximum variable 
node degree d™^'' are defined analogously. 

Tanner graphs are an example of factor graphs, a general 
concept of graphical models which is prevalently used to 
describe probabilistic systems and related algorithms. The 
term stems from viewing the graph as the representation of 
some global function in several variables that factors into a 
product of subfunctions, each depending only on a subset of 
the variables. In case of Tanner graphs, the global function is 
the indicator function of the code, and the subfunctions are 
the parity-checks according to single rows of H. A different 
type of factor graphs will appear later in order to describe 
turbo codes. Far beyond these purely descriptive purpose, 
factor graphs have proven successful in modern coding theory 



primarily in the context of describing and analyzing IMPD 
algorithms. See fSl for a more elaborate introduction. 

Let C be a binary linear code with parity-check matrix H 
and a; G C C {0, 1}". The index set supp(a;) — {j E J : Xj = 
1} is called the support of the codeword x. A codeword 7^ 
X G C is called a minimal codeword if there is no codeword 
^ y E C such that supp(j/) C supp(a;). Finally, D is called 
a minor code of C if D can be obtained from D hy a series 
of shortening and puncturing operations. 

The relationship between binary linear codes and polyhedral 
theory follows from the observation that a binary linear code 
can be considered a set of points in M", i. e., C C {0, 1}" C 
R". In the following, some relevant results from polyhedral 
theory are recalled. For a comprehensive review on polyhedral 
theory the reader is referred to 17|. 



Definition ILl A subset V{A, h) C 
{v E K" : Au <b} where A G 
a polyhedron. 



such that V^A, b) = 
.'"^" fl«c/ 6 G M" is called 



In this article, polyhedra are assumed to be rational, i. e., the 
entries of A and b are taken from Q. The i* row vector of 
A and the i* entry of b together define a closed halfspace 
{i/ E M" : Ai^i/ < bi}. In other words, a polyhedron is the 
intersection of a finite set of closed half spaces. A bounded 
polyhedron is called a polytope. It is known from polyhedral 
theory that a polytope can equivalently be defined as the 
convex hull of a finite set of points. In this work, the convex 
hull of a binary linear code G is denoted by conv(C) and 
referred to as the codeword polytope. 

Some characteristics of a polyhedron are its dimension, 
faces, and facets. To define them, the notion of a valid 
inequality is needed. 

Definition II.2 An inequality r^v < t, where r E M" and 
t E K, is valid for a set r{A,b) C R" if r{A,b) C {v : 
r'^v < t}. 

The following definition of an active inequality is used in 
several LPD algorithms. 

Definition n.3 An inequality r^v < t, where r, v E M" and 
i G M, is active at v* E M" if r'^v* = t. 

Valid inequalities which contain points of V{A, b) are of 
special interest. 

Definition II.4 Let V{A, b) CW be a polyhedron, let r'^v < 
t be a valid inequality for V{A,b) and define F = {ly E 
V{A, b) : r^ V = t}. Then F is called a face ofV{A, b). F is 
a proper face if F =/= (b and F =/= V{A, b). 

The dimension dim(7'(A, 6)) of P{A, b) C M" is given by the 
maximum number of affinely independent points in 7^(^,6) 



minus one. Recall that a set of vectors v^, . 



is affinely 



independent if the system {J2i=i ^kV = 0, X]i=i ^k — 0} 
has no solution other than A^ = for i — l,...,fc. If 
dim('P(yl, 6)) = n, then the polyhedron is full-dimensional. It 
is a well-known result that if V{A, b) is not fuU-dimensional, 



then there exists at least one inequality Aijy < hi such that 
A^,v = h holds for all v e V{A,h) (see e.g. |f7l|). Also, 
we have dini(i^) < dim(7'(A, h)) — 1 for any proper face of 
V{A, b). A face i^ ^ of 7'(^, b) is called a facet of V{A, b) 
if dim(F) = dim(P(A, b)) - 1. 

In the set of inequalities defined by (A, 6), some inequalities 
Ai.v < bi may be redundant, i. e., dropping these inequalities 
does not change the solution set defined by Av < b. A standard 
result states that the facet-defining inequalities give a complete 
non-redundant description of a polyhedron P{A, b) Q. 

A point h' e ViA, b) is called a vertex of V{A, b) if there 
exist no two other points i/^,t/^ G P{A,b) such that u ~ 
fiiu^ + fj,2t^^ with < /ii < 1, < /i2 < 1, and /ii +/i2 = 1- 
Alternatively, vertices are zero dimensional faces of P{A,b). 
In an LP problem, a linear cost function is minimized on a 
polyhedron, i.e., minjc-'^a; : x e ■p(A, fe)}, c G M". Unless 
the LP is infeasible or unbounded, the minimum is attained 
on one of the vertices. 

The number of constraints of an LP problem may be 
very large, e. g. Section |V] contains LPD formulations whose 
description complexity grows exponentially with the block 
length for general codes. In such a case it would be desirable to 
only include the constraints which are necessary to determine 
the optimal solution of the LP with respect to a given objective 
function. This can be accomplished by iteratively solving the 
associated separation problem, defined as follows. 

Definition II.5 Let V{A, b) C M" be a rational polyhedron 
and v* G M" a rational vector The separation problem is to 
either conclude that v* G ViA, h) or, if not, find a rational 
vector (r, i) G R" x R such that r'^v < t for all v G V{A, b) 
and r^v* > t . In the latter case, {r,t) is called a valid cut. 

We will see applications of this approach in Sections |VI] and 
IVIII 

There is a famous result about the equivalence of optimiza- 
tion and separation by Grotschel et al. IS). 

Theorem II.6 Let V be a proper class of polyhedra (see 
s- g- 12J for a definition). The optimization problem for V is 
polynomial time solvable if and only if the separation problem 
is polynomial time solvable. 



III. Complexity and Polyhedral Properties 

In this section, after referencing important NP-hardness 
results for the decoding problem, we state useful properties 
of the codeword polytope, exploiting a close relation between 
coding and matroid theory. 

Integer programming provides powerful means for modeling 
several real-world problems. MLD for binary linear codes is 
modeled as an IP problem in |2|, |9|. Let y G M" be the 
channel output. In MLD the probability (or, in case of a 
continuous-output channel, the probability density) P{y\x) is 
maximized over all codewords a; G C. Let x* denote the ML 
codeword. It is shown in [ I] that for a symmetric memoryless 
channel the calculation of x* amounts to the minimization of 



a linear cost function, namely 

n 

X* = argniaxP(2/|a;) = argmin > Aja;j, (1) 



x£C 



xec 



j=i 



where the values A, — log pi^^i _in are the so-called log- 
likelihood ratios (LLR). Consequently the IP formulation of 
MLD is implicitly given as 



niinjA x : x £ C}. 



(2) 



Berlekamp et al. have shown that MLD is NP-hard in fTOl by a 
polynomial-time reduction of the three-dimensional matching 
problem to the decision version of MLD. An alternative 
proof is via matroid theory: as shall be exposed shortly, 
there is a one-to-one correspondence between binary matroids 
and binary linear codes. In virtue of this analogy, MLD is 
equivalent to the minimum-weight cycle problem on binary 
matroids. Since the latter contains the max-cut problem, which 
is known to be NP-hard 111], as a special case, the NP- 
hardness of MLD follows. 

Another problem of interest in the framework of coding 
theory is the computation of the minimum distance of a given 
code. Berlekamp et al. [TOl conjectured that computing the 
distance of a binary linear code is NP-hard as well, which was 
proved by Vardy lfT2l about two decades later The minimum 
distance problem can again be reformulated in a matroid 
theoretic setting. In 1969 Welsh |13| formulated it as the 
problem of finding a minimum cardinality circuit in linear 
matroids. 

In the following, we assume C C {0, 1}" to be canonically 
embedded in R" when referring to conv(C) (see Fig. |2] for 
an example). Replacing C by conv(C) in ^ leads to a linear 
programming problem over a polytope with integer vertices. 
In general, computing an explicit representation of conv(C) 
is intractable. Nevertheless, some properties of conv(C) are 
known from matroid theory due to the equivalence of binary 
linear codes and binary matroids. In the following, some 
definitions and results from matroid theory are presented. An 
extensive investigation of matroids can be found in lfT4l or 
jTSl . The definition of a matroid in general is rather technical. 



Definition III.l A matroid A4 is an ordered pair A4 — {J,IA) 
where J is a finite ground set and lA is a collection of subsets 
of J, called the independent sets, such that (a) - (c) hold. 

(a) G W. 

(b) If u Cz U and v G u, then v GU. 

(c) Ifui,U2 G U and \ui\ < \u2\ then there exists j G U2\ui 
such that ui U {j} G iY. 

In this work, the class of F2-representable (i. e., binary) 
matroids is of interest. A binary m x n matrix H defines an 
F2-representable matroid A^ [H] as follows. The ground set 
J = {1, . . . ,n} is defined to be the index set of the columns 
of H. A subset C/ C J is independent if and only if the 
column vectors H,u, u d U are linearly independent in the 
vector space defined over the field F2. A minimal dependent 
set, i. e., a set V G 2'^ \U such that all proper subsets of V 
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and the cocircuit inequalities 



Fig. 2. The codewords of the single parity-check code C = {x € F| : 
a;i + 3^2 + 3^3 = (mod 2)} and the polytope conv(C) in R"^. 



are in U, is called a circuit of ^A [H]. If a subset of J is a 
disjoint union of circuits then it is called a cycle. 

The incidence vector x^ G M" corresponding to a cycle 



C C J is defined by 



4- 



1 ifjeC, 
ifj^C. 



The cycle polytope is the convex hull of the incidence vectors 
corresponding to all cycles of a binary matroid. 

Some more relationships between coding theory and ma- 
troid theory (see also fT6l) can be listed: a binary linear code 
corresponds to a binary matroid, the support of a codeword 
corresponds to a cycle (therefore, each codeword corresponds 
to the incidence vector of a cycle), the support of a minimal 
codeword corresponds to a circuit, and the codeword polytope 
conv(C) corresponds to the cycle polytope. Let H he a binary 
matrix, M [H] be the binary matroid defined by H (H is a 
representation matrix of AA [H]) and C be the binary linear 
code defined by H (H is a parity-check matrix of C). It can 
easily be shown that the dual C^ of C is the same object 
as the dual of the binary matroid AA [H] . We denote the dual 
matroid by M [G], where G is the generator matrix of C. 
Usually the matroid related terms are dualized by the prefix 
"co". For example, the circuits and cycles of a dual matroid are 
called cocircuits and cocycles, respectively. The supports of 
minimal codewords and the supports of codewords in C^ are 
associated with cocircuits and cocycles of A^ [H], respectively. 

A minor of a parent matroid Al = {JM) is the sub- 
matroid obtained from M after any combination of contraction 
and restriction operations (see e.g. |14|). In the context of 
coding theory, contraction corresponds to puncturing, i. e., the 
deletion of one or more columns from the generator matrix of 
a parent code, and restriction corresponds to shortening, i. e., 
the deletion of one or more columns from the parity-check 
matrix of a parent code. 

Next, some results from Barahona and Grotschel ifTTl which 
are related to the structure of the cycle polytope are rewritten 
in terms of coding theory. Kashyap provides a similar transfer 
in y_8J. Several results are collected in Theorem IIII.2I 

Theorem III.2 Let C be a binary linear code. 

(a) If d{C^) > 3 then the codeword polytope is full- 
dimensional. 

(b) The box inequalities 



2^^ J 



jesupp(g)\jr 



X, < \T\ 



1 



jeJ' iesupp(g)\jp (4) 

for all J- C supp((7) with \J-\ odd., 

where supp((7) is the support of a dual minimal codeword 
q, are valid far the codeword polytope. 

(c) The box inequalities Xj > 0, Xj < 1 define facets of 
the codeword polytope if d{C^) > 3 and j G J is not 
contained in the support of a codeword in C"*- with weight 
three. 

(d) Ifd{C^) > 3 and C does not contain Hj- ((7,3,4) simplex 
code) as a minor, and if there exists a dual minimal 
codeword q of weight 3, then the cocircuit inequalities 
derived from supp(g) are facets o/conv(C). 



Part (|b|i of Theorem IIII.2I implies that the set of cocircuit 
inequalities derived from the supports of all dual minimal 
codewords provide a relaxation of the codeword polytope. 
In the polyhedral analysis of the codeword polytope the 
symmetry property stated below plays an important role. 

Theorem III.3 HT^I Ifa^x < a defines a face o/conv(C) of 
dimension d, and y is a codeword, then the inequality ax < a 
also defines a face o/conv(C) of dimension d, where 



if 3 i supp(y), 
if j e supp(y). 



and a = a — a y. 



Using this theorem, a complete description of conv(C) can 
be derived from all facets containing a single codeword |17|. 

Let q be a dual minimal codeword. To identify if the 
cocircuit inequalities derived from supp((7) are facet-defining 
it should be checked if supp((7) has a chord. For the formal 
definition of chord, the symmetric difference A which operates 
on two finite sets is used, defined by AAB ^ (^A\B)U{B\A). 
Note that if A = supp(gi), B = supp(g2) and supp(go) = 
AAB, then qo = qi + (72 (mod 2). 

Definition III.4 Let qo,qi,q2 G C*^ be dual minimal code- 
words. If supp{qo) = supp(gi)Asupp(g2) and supp(gi) n 
supp(g2) = {j}, then j is called a chord o/supp((7o)- 



E 



X, < IJ"! - 1 



Q <Xj <1 for all j E J 



(3) 



Theorem III.5 [17] Let C be a binary linear code without 
the (7,3,4) simplex code as a minor and let supp(g) be the 
support of a dual minimal codeword with Hamming weight at 
least 3 and without chord. Then for all T C supp(g) with \T\ 
odd, the inequality 

je.F iesupp(q)\j^ 

defines a facet of conv(C). 

Optimizing a linear cost function over the cycle polytope, 
known as the cycle problem in terms of matroid theory, is 
investigated by Grotschel and Truemper |19|. The work of 
Feldman et al. lID enables to use the matroid theoretic results 
in the coding theory context. As shown above, solving the 



MLD problem for a binary linear code is equivalent to solving 
the cycle problem on a binary matroid. In fT9l, binary matroids 
for which the cycle problem can be solved in polynomial 
time are classified, based on Seymour's matroid decomposition 
theory Il20i . Kashyap fTSl shows that results from fT9l are 
directly applicable to binary linear codes. The MLD problem 
as well as the minimum distance problem can be solved in 
polynomial time for the code families for which the cycle 
problem on the associated binary matroid can be solved in 
polynomial time. This code family is called polynomially 
almost-graphic codes |16|. 

An interesting subclass of polynomially almost-graphic 
codes are geometrically perfect codes. Kashyap translates the 
sum of circuits property (see fTOl) to the realm of binary 
linear codes. If the binary matroid associated with code C has 
the sum of circuits property then conv(C) can be described 
completely and non-redundantly by the box inequalities (O 
and the cocircuit inequalities (|4]i. These codes are referred to 
as geometrically perfect codes in |fT6l . The associated binary 
matroids of geometrically perfect codes can be decomposed 
in polynomial time into its minors which are either graphic 
(see lfT4l ) or contained in a finite list of matroids. 

From a coding theoretic point of view, a family of error- 
correcting codes is asymptotically bad if either dimension 
or minimum distance grows only sublinearly with the code 
length. Kashyap proves that the family of geometrically perfect 
codes unfortunately fulfills this property. We refer to |fT6l for 
the generalizations of this result. 

IV. Basics of LPD 

LPD was first introduced in |2|. This decoding method is, 
in principle, applicable to any binary linear code over any 
binary input memoryless channel|j In this section, we review 
the basics of the LPD approach based on |T|. 

Although several structural properties of conv(C) are 
known, it is in general infeasible to compute a concise de- 
scription of conv(C) by means of linear inequalities. In LPD, 
the linear cost function of the IP formulation is minimized on a 
relaxed polytope V where conv(C) CPC M". Such a relaxed 
poly tope V should have the following desirable properties: 

• V should be easy to describe, and 

• integral vertices of V should correspond to codewords. 
Together with the linear representation ([T]i of the likelihood 
function, this leads to one of the major benefits of LPD, the 
so-called ML certificate property: If the LP decoder outputs 
an integral optimal solution, it is guaranteed to be the ML 
codeword. This is a remarkable difference to IMPD, where 
only some sufficient optimality conditions can be used (see 
e.g. (TT, Sec. 10.3]), but no guaranteed method to decide the 
optimality of a given solution is available. 

Each row (check node) i e / of a parity-check matrix H 
defines the local code 



a 



xe{o,iy 






(mod 2) 



that consists of the bit sequences which satisfy the i parity- 
check constraint; these are called local codewords. A particu- 
larly interesting relaxation of conv(C) is 

r = conv(Ci) n • • • n conv(C„) C [0, 1]", 

known as the fundamental polytope |24|. The vertices of the 
fundamental polytope, the so-called pseudocodewords, are a 
superset of C, where the difference consists only of non- 
integral vertices. Consequently, optimizing over V implies the 
ML certificate property. These observations are formally stated 
in the following result (note that C ~ Ci D ■ ■ ■ D Cm)- 

Lemma IV.l ^ Let V = conv(Ci) n • • • n conv(C„)- // 
C = Ci n • • • n C™ then conv(C) CV and C ^Vn{0, 1}". 

The description complexity of the convex hull of any local 
code conv(Ci) and thus V is usually much smaller than the 
description complexity of the codeword polytope conv(C). 

LPD can be written as optimizing the linear objective 
function on the fundamental polytope V, i. e.. 



niin{A x : x (£ V}. 



(5) 



Based on (|5]l, the LPD algorithm which we refer to as bare 
linear programming decoding (BLPD) is derived. 

Bare LP decoding (BLPD) 
Input: A G M", P C [0, 1]". 
Output: ML codeword or ERROR. 

1: solve the LP given in (|5]i 

2: if LP solution x* is integral then 

3: output X* 

4: else 

5: output ERROR 

6: end if 

Because of the ML certificate property, if BLPD outputs a 
codeword, then it is the ML codeword. BLPD succeeds if the 
transmitted codeword is the unique optimum of the LP given in 
(|5]l. BLPD fails if the optimal solution is non-integral or the 
ML codeword is not the same as the transmitted codeword. 
Note that the difference between the performance of BLPD 
and MLD is caused by the decoding failures for which BLPD 
finds a non-integral optimal solution. It should be emphasized 
that in case of multiple optima it is assumed that BLPD fails. 

In some special cases, the fundamental polytope V is 
equivalent to conv(C), e.g., if the underlying Tanner graph 
is a tree or forest [24 1. In these cases MLD can be achieved 
by BLPD. Note that in those cases also MSAD achieves MLD 
performance 0. 

Observe that the minimum distance of a code can be 
understood as the minimum £i distance between any two 
different codewords of C. Likewise the fractional distance of 
the fundamental polytope V can be defined as follows. 



^ In fact. Flanagan et al. 1211 have recently generalized a substantial portion „ . . ti7i r^l? r t^^t-in ^ ■ / 

of the LPD theory to the nonbinary case. Similarly, work has been done to Definition IV.l iI2J? Let V(V) be the set of vertices (pseu- 
include channels with memory; see e.g. (22). docodewords) of V. The fractional distance dfradV) is the 



minimum ii distance between a codeword and any other vertex 
ofV{V),i.e. 



dfraciV) 



El 



xeC, ve V{r), x^v 



It follows that the fractional distance is a lower bound 
for the minimum distance of a code: d{C) > dfrac(7'). 
Moreover, both definitions are related as follows. Recall that 
on the binary symmetric channel (BSC), MLD corrects at least 
[d(C)/2] — 1 bit flips. As shown in 11], LPD succeeds if at 
most [cifrac('P)/2] — 1 errors occur on the BSC. 

Analogously to the minimum distance, the fractional dis- 
tance is equivalent to the minimum £1 weight of a non-zero 
vertex of V. This property is used by the fractional distance 
algorithm (FDA) to compute the fractional distance of a binary 
linear code |1|. If Ai is the set of inequalities describing 
V, let Aij he the subset of those inequalities which are not 
active at the all-zero codeword. Note that these are exactly 
the inequalities with a non-zero right hand side. In FDA the 
weight function J^jeJ ^0 ^^ subsequently minimized on 'Pn/ 
for all / G A1/ in order to find the minimum- weight non-zero 
vertex of V. 

Fractional distance algorithm (FDA) 

Input: T' C [0, 1]". 

Output: Minimum-weight non-zero vertex of V. 
1: for all / e Mi do 

Set V = Pnf. 



{E 



jeJ -^J 



■.xeV' 



'}■ 



Solve min • 
end for 

Choose the minimum value obtained over all V'. 



A more siginficant distance measure than df^ac is the so- 
called pseudo-distance which quantifies the probability that 
the optimal solution under LPD changes for one vertex of 
V to another 1251 . Il24l . Likewise, the minimum pseudo- 
weight is defined as the minimum pseudo-distance from 
to any other vertex of V and therefor identifies the vertex 
(pseudocodeword) which is most likely to cause a decoding 
failure. Note that the pseudo-distance takes the channel's 
probability measure into account and thus depends on the 
chosen channel model. 

Albeit no efficient algorithms are known to compute the 
exact minimum pseudo-weight of the fundamental polytope 
of a code, promising heuristics as well as analytical bounds 
have been proposed ll24l . ||251 . Il26l . 

V. LPD Formulations for Various Code Classes 

This section reviews various formulations of the polytope V 
from (|5]l, leading to optimized versions of the general BLPD 
algorithm for different classes of codes. 

In Step 1 of BLPD the LP problem is solved by a general 
purpose LP solver These solvers usually employ the simplex 
method since it performs well in practice. The simplex method 
iteratively examines vertices of the underlying polytope until 
the vertex corresponding to the optimal solution is reached. 



If there exists a neighboring vertex for which the objective 
function can be improved in the current step, the simplex 
method moves to this vertex. Otherwise it stops. The procedure 
of moving from one vertex to an other is called a simplex 
iteration. Details on the simplex algorithm can be found in 
classical books about linear programming (see e. g. Il27ll ). 

The efficiency of the simplex method depends on the com- 
plexity of the constraint set describing the underlying polytope. 
Several such explicit descriptions of the fundamental polytope 
V have been proposed in the LPD literature. Some can be 
used for any binary linear code whereas others are specialized 
for a specific code class. Using alternative descriptions of V, 
alternative LP decoders are obtained. In the following, we are 
going to present different LP formulations. 

A. LP formulations for LDPC codes 

The solution algorithm referred to as BLPD in Section |IV] 
was introduced by Feldman et al. [2|. In order to describe V 
explicitly, three alternative constraint sets are suggested by the 
authors by the formulations BLPDl, BLPD2, and BLPD3. In 
the following, some abbreviations are used to denote both the 
formulation and the associated solution (decoding) algorithm, 
e. g., solving an LP, subgradient optimization, neighborhood 
search. The meaning will be clear from the context. 

The first LP formulation, BLPDl, of [25 is applicable to 
LDPC codes. 



minA'x (BLPDl) 
s.t. ^ Wj_s = 1 



i = 1, . . . ,TO 



(6) 



E^ 



seE, 

with jeS 

0<Xj<l 
< w^^s < 1 



Vj e A^j, i = 1,...,TO (7) 

j = l,...,n 

\fS (^ Ei, i — 1, . . . ,m 



Here, E, = {S C Ni : \S\ even} is the set of valid bit 
configurations within Ni. The auxiliary variables lu^ 5 used 
in this formulation indicate which bit configuration S E Ei 
is taken at parity check i. In case of an integral solution, (|6]) 
ensures that exactly one such configuration is attained at every 
checknode, while ^ connects the actual code bits, modeled 
by the variables Xj, to the auxiliary variables: Xj = 1 if and 
only if the set S G Ei contains j for every check node i. Note 
that here we consider the LP relaxation, so it is not guaranteed 
that a solution of the above program is indeed integral. 

A second linear programming formulation for LDPC codes, 
BLPD2, is obtained by employing the so-called forbidden 
set (FS) inequalities 1.28 J . The FS inequalities are motivated 
by the observation that one can explicitly forbid those value 
assignments to variables where \S\ is odd. For all local 
codewords in C, it holds that 



1 



VS-eS, 



E^J" E ^J' - 1*^1 

where Yii = {S C Ni : \S\ odd}. Feldman et al. show in ^ 
that for each single parity-check code Ci, the FS inequalities 



together with the box inequahties < Xj < 1, j ^ J 
completely and non-redundantly describe conv(Ci) (the case 
\Ni\ = 3 as depicted in Fig.|2]is the only exception where the 
box inequahties are not needed). In a more general setting, 
Grotschel proved this result for the cardinality homogeneous 
set systems |29|. 

If the rows of H are considered as dual codewords, the 
set of FS inequalities is a reinvention of cocircuit inequalities 
explained in Section |lll] BLPD2 is given below. 

min X^x (BLPD2) 
s.t. y. ^j ~ /, ^j 1^ \S\ — I VS' G T,i, i = 1, . . . ,m 



to one in the neighborhood of check node i. 



<Xj <1 



J = l, 



Feldman et al. (2] apply BLPD using formulations BLPDl 
or BLPD2 to LDPC codes. Under the BSC, the error- 
correcting performance of BLPD is compared with the MSAD 
on an random rate-i LDPC code with n — 200, d^ = 3, 
dc = 6; with the MSAD, SPAD on the random rate-i LDPC 
code with n = 200, rf„ = 3, 4 = 4; with the MSAD, SPAD, 
MLD on the random rate-i LDPC code with n — 60, dy = 3, 
dc = 4. On these codes, BLPD performs better than MSAD 
but worse than SPAD. Using BLPD2, the FDA is applied 
to random rate-i LDPC codes with n = 100,200,300,400, 
dy = 3 and d^ = 4 from an ensemble of Gallager |30|. For 
(n - l,n) Reed-MuUer codes lEIl with 4 < n < 512 they 
compare the classical distance with the fractional distance. The 
numerical results suggest that the gap between both distances 
grows with increasing block length. 

Another formulation for LDPC codes is given in Sec- 
tion [VLB] in the context of efficient implementations. 

In a remarkable work, Feldman and Stein |32| have shown 
that the Shannon capacity of a channel can be achieved with 
LP decoding, which implies a polynomial-time decoder and 
the availability of an ML certificate. To this end, they use 
a slightly modified version of BLPDl restricted to expander 
codes, which are a subclass of LDPC codes. See |32| for a 
formal definition of expander codes as well as the details of 
the corresponding decoder. 



B. LP formulations for codes with high-density parity-check 
matrices 

The number of variables and constraints in BLPDl as 
well as the number of constraints in BLPD2 increase ex- 
ponentially in the check node degree. Thus, for codes with 
high-density parity-check matrices, BLPDl and BLPD2 are 
computationally inefficient. A polynomial-sized formulation, 
BLPD3, is based on the parity polytope of Yannakakis ll33l . 
There are two types of auxiliary variables in BLPD3. The 
variable p, /^ is set to one if k variable nodes are set to one 
in the neighborhood of parity-check i, for k in the index set 
■^ 0, 2, . . . , 2 -Lj^ >. Furthermore, the variable qj^i.k is 



i^^ = |0,2,...,2 
set to one if variab 



min A-' X (BLPD3) 



s.t. X. 



- z2 '^■J^«''= 



keKi 

Y. p,.k = 1 

keKi 

0<Xj<l 
< m,k < 1 
< qj,t,k < PiM 



zeiVj, j = i,, 

i = 1, . . . TO 
k e K,, i = I, 



. . m 



j = l,...,n 

k E Ki, i = 1, . . . ,m 

k e Ki, j = 1,. . . ,n,i G Nj 



Feldman et al. [2] show that BLPDl, BLPD2, and BLPD3 
are equivalent in the sense that the a; -variables of the optimal 
solutions in all three formulations take the same values. 

The number of variables and constraints in BLPD3 increases 
as 0{n^). By applying a decomposition approach, Yang et 
al. [34J show that an alternative LP formulation which has 
size linear in the length and check node degrees can be 
obtained (it should be noted that independently from |34| a 
similar decomposition approach was also proposed in |35|). 
In the LP formulation of |34| a high degree check node is 
decomposed into several low degree check nodes. Thus, the 
resulting Tanner graph contains auxiliary check and variable 
nodes. Fig. [3] illustrates this decomposition technique: a check 
node with degree 4 is decomposed into 3 parity checks each 
with degree at most 3. The parity-check nodes are illustrated 




Fig. 3. Check node decomposition. 

by squares. In the example, original variables are denoted by 
j^i, . . . , i/4 while the auxiliary variable node is named t/5. In 
general, this decomposition technique is iteratively applied 
until every check node has degree less than 4. The authors 
show that the total number of variables in the formulation is 
less than doubled by the decomposition. For the details of the 
decomposition |34| is referred. 

For the ease of notation, suppose K is the set of parity- 
check nodes after decomposition. If dc{k) — 3, k E K, then 



the parity-check constraint k is of the form j/f 







(mod 2). Note that with our notation some of these variables 
i^g might represent the same variable node i/j, e.g. v^ from 
Fig. [3] would appear in two constraints of the above form, as 
lyl and ly^,, respectively. Yang et al. show that the parity-check 
+ i/| = (mod 2) can be replaced by the 



constramt u 



k 



linear constraints v^ - 



<2,v1- 



< 0,K 



fc 



e node j is one of the k variable nodes set 



v^ <Q,v^ — Vi—V2 — (for ^ single check node of degree 3 
the box inequalities are not needed). If d(.{k) — 2 then v^ — 
V2 along with the box constraints models the parity-check. The 
constraint set of the resulting LP formulation, which we call 
cascaded linear programming decoding (CLPD), is the union 



of all constraints modeling the \K\ parity checks. 
xmnX^v (CLPD) 

s.t. ^y^ -Yl "j < l-S*! - 1 V5 e Sfc, fc = 1, . . . , \K\ 



jes ]eNk\s 
0<iyj<l 



if dcii) <2yi:j £ N, 



In the objective function only the v variables corresponding to 
the original x variables have non-zero coefficients. Thus, the 
objective function of CLPD is the same as of BLPDl. The 
constraints in CLPD are the FS inequalities used in BLPD2 
with the property that the degree of the check node is less 
than 4. 

Yang et al. prove that the formulations introduced in ID 
and CLPD are equivalent. Again, equivalence is used in the 
sense that in an optimal solution, the x-variables of BLPDl, 
BLPD2, BLPD3, and the variables of the CLPD formulation 
which correspond to original x-variables take the same values. 
Moreover, it is shown that CLPD can be used in FDA. As a 
result, the computation of the fractional distance for codes with 
high-density parity-check matrices is also facilitated. Note that 
using BLPD2, the FDA algorithm has polynomial running time 
only for LDPC codes. If P is described by the constraint set 
of CLPD, then in the first step of the FDA, it is sufficient to 
choose the set T from the facets formed by cutting planes of 



type i^i 



vk -\- v^ = 2 where v^, v^, and v^ are variables 



of the CLPD formulation. Additionally, an adaptive branch & 
bound method is suggested in 1*361 to find better bounds for 
the minimum distance of a code. On a random rate-i LDPC 
code with n = 60, d^ — "i, dc — 4, it is demonstrated that this 
yields a better lower bound than the fractional distance does. 

C. LP formulations for turbo-like codes 

The various LP formulations outlined so far have in com- 
mon that they are derived from a parity-check matrix which 
defines a specific code. A different approach is to describe the 
encoder by means of a finite state machine, which is the usual 
way to define so-called convolutional codes. The bits of the 
information word are subsequently fed into the machine, each 
causing a state change that emits a fixed number of output 
bits depending on both the current state and the input. In a 
systematic code, the output always contains the input bit. The 
codeword, consisting of the concatenation of all outputs, can 
thus be partitioned into the systematic part which is a copy of 
the input and the remaining bits, being refered to as the parity 
output. 

A convolutional code is naturally represented by a trellis 
graph (Fig. |4]l, which is obtained by unfolding the state dia- 
gram in the time domain. Each vertex of the trellis represents 
the state at a specific point in time, while edges correspond 
to valid transitions between two subsequent states and are 
labelled by the according input and output bits. Each path from 
the starting node to the end node corresponds to a codewordQ 
The cost of a codeword is derived from the received LLR 
values and the edge labels on the path associated with this 
codeword. See |23| for an in-depth survey of these concepts. 

*We intentionally do not discuss trellis termination here and assume that 
the encoder always ends in a fixed terminal state; cf. 1231 for details. 




Fig. 4. Excerpt from a trellis graph with four states and initial state 0. The 
style of an edge indicates the according information bit, while the labels refer 
to the single parity bit. 



Convolutional codes are the building blocks of turbo codes, 
which revolutionized coding theory because of their near 
Shannon limit error-correcting performance 1.37 J . An (n, fc) 
turbo code consists of two convolutional codes Ca and Cb, 
each of input length fc, which are linked by a so-called 
interleaver that requires the information bits of Ca to match 
those of Cfe after being scrambled by some permutation tt G Sfc 
which is fixed for a given codeO It is this coupling of rather 
weak individual codes and the increase of complexity arising 
therefrom that entails the vast performance gain of turbo codes. 
A typical turbo code (and only this case is covered here; 
it is straightforward to generalize) consists of two identical 
systematic encoders of rate ^ each. Only one of the encoders 
Ca and Cb, however, contributes its systematic part to the 
resulting codeword, yielding an overall rate of |, i. e. n = 3fc 
(since their systematic parts differ only by a permutation, 
including both would imply an embedded repetition code). 
We thus partition a codeword x into the systematic part x^ 
and the parity outputs x" and x^ of Ca and Cb, respectively. 

A turbo code can be compactly represented by a so-called 
Forney-style factor graph (FFG) as shown in Fig. |5] As 
opposed to Tanner graphs, in an FFG all nodes are functional 
nodes, whereas the (half-)edges correspond to variables. In our 
case, there are variables of two types, namely state variables 
s^j {v G {a, 6}), reflecting the state of C^, at time step j, 
and a variable for each bit of the codeword x. Each node T" 
represents the indicator function for a valid state transition in 
Cv at time j and is thus incident to one systematic and one 
parity variable as well as the "before" and "after" state s^_]^ 
and s^, respectively. Note that such a node T" corresponds to 
a vertical "slice" (often called a segment) of the trellis graph 
of Ci,, and each valid configuration of T^ is represented by 
exactly one edge in the respective segment. 

Turbo codes are typically decoded by IMPD techniques op- 
erating on the factor graph. Feldman [1] in contrast introduced 
an LP formulation, turbo code linear programming decoding 
(TCLPD), for this purpose. This serves as an example that ma- 
thematical programming is a promising approach in decoding 
even beyond formulations based on parity-check matrices. 

In TCLPD, the trellis graph of each constituent encoder C" 
is modeled by flow conservation and capacity constraints 1*391, 
along with side constraints appropriately connecting the flow 

^ Using exactly two constituent convolutional encoders eases notation and 
is the most common case, albeit not being essential for the concept — in 
fact, recent development suggest that the error-correcting performance benefits 
from adding a third encoder 1381 . 
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Fig. 5. The factor graph of a turbo code. The interleaver links the .systematic 
bits x^ of both encoders Ca (upper part) and C;, (lower part). 



variables f^ to auxiliary variables x^ and x'^ , respectively, 
which embody the codeword bits. 

For V G {a,b}, let Gi, = {S^,Ei,) be the trellis according 
to C^, where S„ is the index set of nodes (states) and E,y 
is the set of edges (state transitions) e in G,y. Let s^'^"-'' and 
^end.i/ (jgjjote the unique start and end node, respectively, of 
Gp. We can now define a feasible flow f in the trellis Gi, 
by the system 



E /e^ = l' E fe=l, 



(8) 



E fe= Y. fe Vse5,\{5^'-^-,S™'''''}, (9) 

eGout(s) eGin(s) 

re>Q VeeS,. (10) 

Let n and (9^ denote the set of edges in Gu whose corre- 
sponding input and output bit, respectively, is a 1 (both being 
subsets of the j-th segment of G^), the following constraints 
relate the codeword bits to the flow variables: 



x"^ ^Y fe ioT j = l,...,k and 1^ e {a, 6}, (11) 

xj = E/" forj = l,...,fc, (12) 

<U)^Y.f' forj- = l,...,fc. (13) 



ef^I" 



We can now state TCLPD as 



min E {^"fx^ + iX^fx' (TCLPD) 

v£{a,b] 

s. t. ©-^ hold. 

where A is split in the same way as x. 

The formulation straightforwardly generalizes to all sorts 
of "turbo-like" codes, i. e., codes built by convolutional codes 
plus interleaver conditions. In particular, Feldman and Karger 
have applied TCLPD to repeat-accumulate (RA(/)) codes 
l|40l. The encoder of an RA(/) repeats the information bits 
I times, and then sends them to an interleaver followed by an 
accumulator, which is a two-state convolutional encoder. The 
authors derive bounds on the error rate of TCLPD for RA 
codes which were later improved and extended by Halabi and 
Even ll4n as well as by Goldenberg and Burshtein 



Note that all x variables in TCLPD are auxiliary: we could 
replace each occurence by the sum of flow variables defining 
it. In doing so, (fT2] i and (fT3] i break down to the condition 



E /e=E/' forj = l, 



,fc. 



(14) 



ee/" 



eG/'' 



Because the rest of the constraints defines a standard network 
flow, TCLPD models a minimum cost flow problem plus the 
k additional side constraints (fT4l i. Using a general purpose 
LP solver does not exploit this combinatorial substructure. As 
was suggested akeady in fT], in |43| Lagrangian relaxation 
is appHed to (fT4l i in order to recover the underlying shortest- 
path problem. Additionally, the authors of P3l use a heuristic 
based on computing the K shortest paths in a trellis to 
improve the decoding performance. Via the parameter K the 
trade-off between algorithmic complexity and error-correcting 
performance can be controlled. 

VI. Efficient LP Solvers for BLPD 

A successful realization of BLPD requires an efficient LP 
solver To this end, several ideas have been suggested in the 
literature. CLPD (cf. Section[V]l can be considered an efficient 
LPD approach since the number of variables and constraints 
are significantly reduced. We review several others in this 
section. 

A. Solving the separation problem 

The approach of Taghavi and Siegel f445 tackles the large 
number of constraints in BLPD2. In their separation approach 
called adaptive linear programming decoding (ALPD), not 
all FS inequalities are included in the LP formulation as in 
BLPD2. Instead, they are iteratively added when needed. As 
in Definition III. 51 the general idea is to start with a crude 
LP formulation and then improve it. Note that this idea can 
also be used to improve the error-correcting performance 
(see Section I Vill i. In the initialization step, the trivial LP 
minjA^x : x G [0,1]"} is solved. Let [x*)^ be the optimal 
solution in iteration k. Taghavi and Siegel show that it can 
be checked in 0{md'^'^^ + nlogn) time if (x*)^ violates any 
FS inequality derived from Hi,,x = (mod 2) for all i e / 
(recall that m x ri is the dimension of H and d™'''^ is the 
maximum maximum check-node degree). This check can be 
considered as a special case of the greedy separation algorithm 
(GSA) introduced in |29|. If some of the FS inequalities are 
violated then these inequalities are added to the formulation 
and the modified LP is solved again with the new inequalities. 
ALPD stops if the current optimal solution [x*)^ satisfies 
all FS inequalities. If {x*Y is integral then it is the ML 
codeword, otherwise an error is output. ALPD does not yield 
an improvement in terms of frame error rate since the same 
solutions are found as in the formulations in the previous 
section. However, the computational complexity is reduced. 

An important algorithmic result of [441 is that ALPD 
converges to the same optimal solution as BLPD2 with sig- 
nificantly fewer constraints. It is shown empirically that in the 
last iteration of ALPD, less constraints than in the formulations 
BLPD2, BLPD3, and CLPD are used. Taghavi and Siegel ||44| 
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prove that their algorithm converges to the optimal solution 
on the fundamental polytope after at most n iterations with at 
most n(m + 2) constraints. 

Under the binary-input additive white Gaussian noise chan- 
nel (BIAWGNC), f44] uses various random {d^ , dc)-i'egulctr 
codes to test the effect of changing the check node degree, 
the block length, and the code rate on the number of FS 
inequalities generated and the convergence of their algorithm. 
Setting n = 360 and rate R — ^, the authors vary the check 
node degree in the range of 4 to 40 in their computational 
testing. It is observed that the average and the maximum 
number of FS inequalities remain below 270. The effect of 
changing block length n between 30 and 1920 under R ~ ^ 
is demonstrated on a (3,6)-regular LDPC code. For these 
codes, it is demonstrated that the number of FS inequalities 
used in the final iteration is generally between 0.6n and Q.7n. 
Moreover, it is reported that the number of iterations remain 
below 16. The authors also investigate the effect of the rate 
on the number of FS inequalities created. Simulations are 
performed on codes with n — 120 and d^, = 3 where the 
number of parity checks m vary between 15 and 90. For 
most values of m it is observed that the average number of 
FS inequalities ranges between 1.1m and 1.2m. For ALPD, 
BLPD2, and SPAD (50 iterations), the average decoding time 
is testet for (3, 6)-regular and (4, 8)-regular LDPC codes with 
various block lengths. It is shown that ALPD outperforms 
BLPD with respect to computation time, whil still being slower 
than SPAD. Furthermore, increasing the check node degree 
does not increase the computation time of ALPD as much 
as the computation time of BLPD. The behavior of ALPD, 
in terms of the number of iterations and the FS inequalities 
used, under increasing SNR is tested on a (3, 6)-regular LDPC 
code with n = 240. It is concluded that ALPD performs more 
iterations and uses more FS inequalities for the instances it 
fails. Thus, decoding time decreases with increasing SNR. 

In |J45 1 ALPD is improved further in terms of complexity. 
The authors use some structural properties of the fundamental 
polytope. Let (a;*)'" be an optimal solution in iteration k. In 
||44| it is shown that, if (x*)'^ does not satisfy an FS inequality 
derived from check node i, then (a;*)'^' satisfies all other FS 
inequalities derived from i with strict inequality. Based on 
this result, Taghavi et al. ||451 modify ALPD and propose 
the decoding approach we refer to as modified adaptive linear 
programming decoding (MALPD). In the (fc + 1)* iteration 
of MALPD, it is checked in 0{md™^^) time if (a;*)*-' violates 
any FS inequality derived from Hi,,x = (mod 2) for some 
i £ I. This check is performed only for those parity checks 
i E I which do not induce any active FS inequality at [x*)^ . 
Moreover, it is proved that inactive FS inequalities at iteration 
k can be dropped. In any iteration of MALPD, there are at 
most m FS inequalities. However, the dropped inequalities 
might be inserted again in a later iteration; therefore the 
number of iterations for MALPD can be higher than for 
ALPD. 

B. Message passing-like algorithms 

An approach towards low complexity LPD of LDPC codes 
was proposed by Vontobel and Kotter in ll46l . Based on an 



FFG representation of an LDPC code, they derive an LP, called 
primal linear programming decoding (PLPD), which is based 
on BLPDl. The FFG, shown in Fig. |6] and the Tanner graph 
are related as follows. 




Fig. 6. A Forney-style factor graph for PLPD. 

For each parity check, the FFG exhibits a node Ci which is 
incident to a variable-edge Vij for each j G Ni and demands 
those adjacent variables to form a configuration that is valid 
for the local code Ci, i. e., their sum must be even. This 
corresponds to a check node in the Tanner graph and thus 
to (|6]l and O except that now there are, for the moment, 
independent local variables w^ j for each Ci. Additionally, the 
FFG generalizes the concept of row-wise local codes Ci to the 
columns of H, in such a way that the j* column is considered 
a local repetition code Aj that requires the auxiliary variables 
Uj^i for each i £ Njyj{Q} to be either all 1 or all 0. By this, the 
variable nodes of the Tanner graph are replaced by check nodes 
Aj — ^recall that in an FFG all nodes have to be check nodes. 
There is a third type of factor nodes, labelled by "=", which 
simply require all incident variables to take on the same value. 
These are used to establish consistency between the row-wise 
variables w; j and the column-wise variables Uji as well as 
connecting the codeword variables Xj to the configurations of 
the Ay 

From this discussion it is easily seen that the FFG indeed 
ensures that a any configuration of the Xj is a valid codeword. 
The outcome of writing down the constraints for each node 
and relaxing integrality conditions on all variables is the LP 

min \^x (PLPD) 

s.t. Xj ~ Ujfi j = 1, . . . , n, 

Uj^i = Vi^j V(i,j) e I X J : Hi J = 1, 

Uj,i = ^ aj,s \fi e Nj,j ^l,...,n, 

SeAj,S3j 

/ Oij.s — 1 for all j = 1, . . . , n, 

SeAj 

Vij = ^ Wt,s yj e Ni,i = 1,.. .,m, 

S(^Ei,S33 

^ Wi^S = 1 
seE, 

aj.s > 

Wi,s > 



for alH = 1, . . . , m, 

yS (E Aj,j = l,...,n, 
V5 G Ei,i = 1,. . . ,m. 



where the sets Ei are defined as in (BLPDl). 

While bloating BLPDl in this manner seems inefficient 
at first glance, the reason behind is that the LP dual of 
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PLPD, leads to an FFG which is topologically equivalent to 
the one of the primal LP, which allows to use the graphical 
structure for solving the dual. After manipulating constraints 
of the dual problem to obtain a closely related, "softened" 
dual linear programming decoding (SDLPD) formulation, the 
authors propose a coordinate-ascent-type algorithm resembling 
the min-sum algorithm and show convergence under certain 
assumptions. In this algorithm, all the edges of FFG are 
updated according to some schedule. It is shown that the 
update calculations required during each iteration can be 
efficiently performed by the SPAD. The coordinate-ascent-type 
algorithm for SDLPD is guaranteed to converge if all the edges 
of the FFG are updated cyclically. 

Under the BIAWGNC, the authors compare the error- 
correcting performance of the coordinate-ascent-type algo- 
rithm (max iterations: 64, 256) against the performance of the 
MSAD (max iterations: 64, 256) on the (3, 6)-regular LDPC 
code with n — 1000 and rate i? = 5- MSAD performs slightly 
better than the coordinate-ascent-type algorithm. In summary, 
1461 shows that it is possible to develop LP based algorithms 
with complexities similar to IMPD. 

The convergence and the complexity of the coordinate- 
ascent-type algorithm proposed in |46| are studied further 
in 1471 by Burshtein. His algorithm has a new scheduling 
scheme and its convergence rate and computational complexity 
are analyzed under this scheduling. With this new scheduling 
scheme, the decoding algorithm from |46| yields an iterative 
approximate LPD algorithm for LDPC codes with complexity 
in 0{n). The main difference between the two algorithms is 
the selection and update of edges of the FFG. In |46| all 
edges are updated cyclically during one iteration, whereas in 
||47l, only few selected edges are updated during one particular 
iteration. The edges are chosen according to the variable values 
obtained during previous iterations. 

C. Nonlinear programming approach 

As an approximation of BLPD for LDPC codes, Yang et al. 
Il36l introduce the box constraint quadratic programming de- 
coding (BCQPD) whose time complexity is linear in the code 
length. BCQPD is a nonlinear programming approach derived 
from the Lagrangian relaxation (see Q for an introduction 
to Lagrangian relaxation) of BLPDl. To achieve BCQPD, a 
subset of the set of the constraints are incorporated into the 
objective function. To simplify notation, Yang et al. rewrite the 
constraint blocks (|6]l and (jT) in the general form Ay = 6 by 
defining a single variable vector y = (x, wY' e {0, 1}^ (so K 
is the total number of variables in BLPDl) and choosing A and 
h appropriately. Likewise, the objective function coefficients 
are rewritten in a vector c, wich equals A followed by the 
appropriate number of zeros. The resulting formulation is 
minjc^y : Ay = b,y <E {0, 1}^}. Using a multiplier a > 0, 
the Lagrangian of this problem is 

mine y + a{Ay — b) {Ay ~ b) 
s.t. 0<2/fc<l for fc = l,...,i^. 

If Ay ~ b is violated then a positive value is added to 
the original objective function c^y, i. e., the solution y is 



penalized. Setting Q 
BCQPD problem 



2aA'^A and r 



2aA'^b the 



min y'^Qy + 2r'^y (BCQPD) 
s.t. < 2/fc < 1 for fc = l,...,ivr 

is obtained. Since Q is a positive semi-definite matrix, i. e., the 
objective function is convex, and since the set of constraints 
constitutes a box, each yk can be minimized separately. This 
leads to efficient serial and parallel decoding algorithms. Two 
methods are proposed in |36| to solve the BCQPD problem, 
the projected successive overrelaxation method (PSORM) and 
the parallel gradient projection method (PGPM). These meth- 
ods are generalizations of Gauss-Seidel and Jacobi methods 
P8l with the benefit of faster convergence if proper weight 
factors are chosen. PSORM and PGPM benefit from the low- 
density structure of the underlying parity-check matrix. 

One of the disadvantages of IMPD is the difficulty of 
analyzing the convergence behavior of such algorithms. Yang 
et al. showed both theoretically and empirically that BCQPD 
converges under some assumptions if PSORM or PGPM is 
used to solve the quadratic programming problem. Moreover, 
the complexity of BCQPD is smaller than the complexity of 
SPAD. For numerical tests, the authors use a product code 
with block length 4^ = 1024 and rate (|)^ = 0.237. The 
BIAWGNC is used. It is observed that the PSORM method 
converges faster than PGPM. The error-correcting performance 
of SPAD is poor for product codes due to their regular 
structure. For the chosen product code, Yang et al. demonstrate 
that PSORM outperforms SPAD in computational complexity 
as well as in error-correcting performance. 

D. Efficient LPD of SPC product codes 

The class of single parity-check (SPC) product codes is 
of special interest in |34|. The authors prove that for SPC 
product codes the fractional distance is equal to the minimum 
Hamming distance. Due to this observation, the minimum 
distance of SPC product codes can be computed in polynomial 
time using FDA. Furthermore, they propose a low complexity 
algorithm which approximately computes the CLPD optimum 
for SPC product codes. This approach is based on the ob- 
servation that the parity-check matrix of an SPC product code 
can be decomposed into component SPC codes. A Lagrangian 
relaxation of CLPD is obtained by keeping the constraints 
from only one component code in the formulation and moving 
all other constraints to the objective function with a penalty 
vector. The resulting Lagrangian dual problem is solved by 
subgradient algorithms (see [71). Two alternatives, subgradient 
decoding (SD) and joint subgradient decoding (JSD) are 
proposed. It can be proved that subgradient decoders converge 
under certain assumptions. 

The number of iterations performed against the convergence 
behavior of SD is tested on the (4,4) SPC product code, which 
has length n = 256, rate R = (|) « 0.32 and is defined as 
the product of four SPC codes of length 4 each. All variants 
tested (obtained by keeping the constraints from component 
code j = 1, 2, 3, 4 in the formulation) converge in less than 20 
iterations. For demonstrating the error-correcting performance 
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of SD if the number of iterations are set to 5, 10, 20, 100, the 
(5,2) SPC product code (n = 25, rate R ^ [±f =. 0.64) 
is used. The error-correcting performance is improved by 
increasing the number of iterations. Under the BIAWGNC, 
this code and the (4,4) SPC product code are used to compare 
the error-correcting performance of SD and JSD with the 
performance of BLPD and MLD. It should be noted that for 
increasing SNR values, the error-correcting performance of 
BLPD converges to that of MLD for SPC codes. JSD and SD 
approach the BLPD curve for the code with n — 25. For the 
SPC product code with n — 256 the subgradient algorithms 
perform worse than BLPD. For both codes, the error-correcting 
performance of JSD is superior to SD. Finally, the (10, 3) SPC 
product code with n = 1000 and rate R = (^)3 w 0.729 
is used to compare the error-correcting performance of SD 
and JSD with the SPAD. Again the BIAWGNC is used. It 
is observed that SD performs slightly better than the SPAD 
with a similar computational complexity. JSD improves the 
error-correcting performance of the SD at the cost of increased 
complexity. 



E. Interior point algorithms 

Efficient LPD approaches based on interior point algorithms 
are studied by Vontobel f^Ol, Wadayama fSOl, and Taghavi 
et al. [45]. The use of interior point algorithms to solve LP 
problems as an alternative to the simplex method was initiated 
by Karmarkar \5V\. In these algorithms, a starting point in 
the interior of the feasible set is chosen. This starting point 
is iteratively improved by moving through the interior of the 
polyhedron in some descent direction until the optimal solution 
or an approximation is found. There are various interior point 
algorithms and for some, polynomial time convergence can be 
proved. This is an advantage over the simplex method which 
has exponential worst case complexity. 

The proposed interior point algorithms aim at using the 
special structure of the LP problem. The resulting running time 
is a low-degree polynomial function on the block length. Thus, 
fast decoding algorithms based on interior point algorithms 
may be developed for codes with large block lengths. In par- 
ticular affine scaling algorithms |49|, primal-dual interior point 
algorithms B31 . ||49l and primal path following interior point 
algorithm f50l are considered. The bottleneck operation in 
interior point methods is to solve a system of linear equations 
depending on the current iteration of the algorithm. Efficient 
approaches to solve this system of equations are proposed in 
|49), |45|, the latter containing an extensive study, including 
investigation of appropriate preconditioners for the often ill- 
conditioned equation system. The speed of convergence to the 
optimal vertex of the algorithms in |50| and |45| under the 
BIAWGNC are demonstrated on a nearly (3, 6)-regular LDPC 
code with n — 1008, R—\ and a randomly-generated (3, 6)- 
regular LDPC code with n — 2000, respectively. 

VII. Improving the Error-Correcting 
Performance of BLPD 

The error-correcting performance of BLPD can be im- 
proved by techniques from integer programming. Most of the 



improvement techniques can be grouped into cutting plane 
or branch & bound approaches. In this section, we review 
the improved LPD approaches mainly with respect to this 
categorization. 

A. Cutting plane approaches 

The fundamental polytope V can be tightened by cutting 
plane approaches. In the following, we refer to valid in- 
equalities as inequalities satisfied by all points in conv(C). 
Valid cuts are valid inequalities which are violated by some 
non-integral vertex of the LP relaxation. Feldman et al. Q 
already address this concept; besides applying the "Lift and 
project" technique which is a generic tightening method for 
integer programs |52|, they also strengthen the relaxation by 
introducing redundant rows into the parity-check matrix (or, 
equivalently, redundant parity-checks into the Tanner graph) 
of the given code (cf. Section |II|. When using the BLPD2 
formulation, we derive additionaly FS inequalities from the 
redundant parity-checks without increasing the number of vari- 
ables. We refer to such inequalities as redundant parity-check 
(RPC) inequalities. RPC inequalities may include valid cuts 
which increase the possibility that LPD outputs a codeword. 
An interesting question relates to the types of inequalities 
required to describe the codeword polytope conv(C) exactly. 
It turns out that conv(C) cannot be described completely by 
using only FS and box inequalities; the (7, 3, 4) simplex code 
(dual of the (7, 4, 3) Hamming code) is given as a counter- 
example in f2\. More generally, it can be concluded from |53| 
that these types of inequalities do not suffice to describe all 
facets of a simplex code. 

RPCs can also be interpreted as dual codewords. As such, 
for interesting codes there are exponentially many RPC in- 
equalities. The RPC inequalities cutting off the non-integral 
optimal solutions are called RPC cuts [44 1. An analytical study 
under which circumstances RPCs can induce cuts is carried 
out in 1241 . Most notably, it is shown that RPCs obtained by 
adding no more than ^^ dual codewords, where g is the 
length of a shortest cycle in the Tanner graph, never change 
the fundamental polytope. 

There are several heuristic approaches in the LPD literature 
to find cut inducing RPCs |2|, |54|, il, Jsg. In E), RPCs 
which result from adding any two rows of H are appended 
to the original parity-check matrix. The authors of f44| find 
RPCs by randomly choosing cycles in the fractional subgraph 
of the Tanner graph, which is obtained by choosing only 
the fractional variable nodes and the check nodes directly 
connected to them. They give a theorem which states that every 
possible RPC cut must be generated by such a cycle. Their 
approach is a heuristic one since the converse of that theorem 
does not hold. In 1541 the column index set corresponding to an 
optimal LP solution is sorted. By re-arranging H and bringing 
it to row echelon form, RPC cuts are searched. In |55|, the 
parity-check matrix is reformulated such that unit vectors are 
obtained in the columns of the parity-check matrix which 
correspond to fractional valued bits in the optimal solution 
of the current LP. RPC cuts are derived from the rows of the 
modified parity-check matrix. 
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The approaches in 1281 . B4l . and USSl rely on a noteworthy 
structural property of the fundamental ploy tope. Namely, it 
can be shown that no check node of the associated Tanner 
graph (regardless of the existence of redundant parity-checks) 
can be adjacent to only one non-integral valued variable node. 

Feldman et al. fS\ test the lift and project technique on a 
random rate-i LDPC code with n = 36, d^ — 3 and dt, = 4 
under the BIAWGNC. Moreover, a random rate-i LDPC code 
with n = 40, dy — 3, and dc = 4 is used to demonstrate 
the error-correcting performance of BLPD when the original 
parity-check matrix is extended by all those RPCs obtained by 
adding any two rows of the original matrix. Both tightening 
techniques improve the error-correcting performance of BLPD, 
though the benefit of the latter is rather poor, due to the 
above mentioned condition on cycle lengths. 

The idea of tightening the fundamental polytope is usually 
implemented as a cutting plane algorithm, i. e., the separation 
problem is solved (see Definition III. 51 and Section IVI-AI ). In 
cutting plane algorithms, an LP is solved which contains only 
a subset of the constraints of the corresponding optimization 
problem. If the optimal LP solution is a codeword then 
the cutting plane algorithm terminates and outputs the ML 
codeword. Otherwise, valid cuts from a predetermined family 
of vaUd inequalities are searched. If some valid cuts are found, 
they are added to the LP formulation and the LP is resolved. 
In 1041, im, ||55| the family of vahd cuts is FS inequalities 
derived from RPCs. 



In 0541 the main motivation for the greedy cutting plane 
algorithm is to improve the fractional distance. This is demon- 
strated for the (7,4,3) Hamming code, the (24, 12,8) Golay 
code and a (204, 102) LDPC code. As a byproduct under the 
BSC it is shown on the (24, 12, 8) Golay code and a (204, 102) 
LDPC code that the RPC based approach of Ii54il improves the 
error-correcting performance of BLPD. 

In the improved LPD approach of (44], first ALPD (see 
Section IVIb is applied. If the solution is non-integral, an RPC 
cut search algorithm is employed. This algorithm can be briefly 
outlined as follows: 

1) Given a non-integral optimal LP solution x*, remove all 
variable nodes j for which x* is integral from the Tanner 
graph. 

2) Find a cycle by randomly walking through the pruned 
Tanner graph. 

3) Sum up (in F2) the rows H which correspond to the 
check nodes in the cycle. 

4) Check if the resulting RPC introduces a cut. 



The improved decoder of 1441 performs noticeably better than 
BLPD and SPAD. This is shown under the BIAWGNC on 
(3, 4)-regular LDPC codes with n = 32, 100, 240. 

The cutting plane approach of ll55l is based on an IP 
formulation of MLD, which is referred to as IPD. (Note 
that this formulation was already mentioned in fg).) Auxiliary 
variables z G Z™ model the binary constraints Hx ~ over 



F2 in the real number field R". 

min X^x (IPD) 

s.t. Hx - 2z = 

xe{0,l}", zeZ™ 

In (55], the LP relaxation of IPD is the initial LP problem 
which is solved by a cutting plane algorithm. Note that the 
LP relaxation of IPD is not equivalent to the LP relaxations 
given in Section |V] In almost all improved (in the error- 
correcting performance sense) LPD approaches reviewed in 
this article first the BLPD is run. If BLPD fails, some 
technique to improve BLPD is used with the goal of find- 
ing the ML codeword at the cost of increased complexity. 
In contrast, the approach by Tanatmis et al. in |55| does 
not elaborate on the solution of BLPD, but immediately 
searches for cuts which can be derived from arbitrary dual 
codewords. To this end, the parity-check matrix is modified 
and the conditions under which certain RPCs define cuts 
are checked. The average number of iterations performed 
and the average number of cuts generated in the separation 
algorithm decoding (SAD) of ll55l are presented for the (3, 6) 
random regular codes with n = 40, 80, 160, 200, 400 and for 
the (31, 10), (63, 39), (127, 99), (255, 223) BCH codes. Both 
performance measures seem to be directly proportional to the 
block length. The error-correcting performance of SAD is 
measured on the random regular (3, 4) LDPC codes with block 
length 100 and 200, and Tanner's (155,64) group structured 
LDPC code fS^l. It is demonstrated that the improved LPD 
approach of [55 1 performs better than BLPD applied in the 
adaptive setting P4l and better than SPAD. One significant 
numerical result is that SAD proposed in [55 1 performs 
much better than BLPD for the (63,39) and (127,99) BCH 
codes, which have high-density parity check matrices. In all 
numerical simulations the BIAWGNC is used. 

Yufit et al. §7\ improve SAD [55] and ALPD g4\ by 
employing several techniques. The authors propose to improve 
the error-correcting performance of these decoding methods by 
using RPC cuts derived from alternative parity-check matrices 
selected from the automorphism group of C, Aut(C). In the 
alternative parity-check matrices, the columns of the original 
parity-check matrix are permuted according to some scheme. 
At the first stage of Algorithm 1 of ||57l, SAD is used to 
solve the MLD problem. If the ML codeword is found then 
Algorithm 1 terminates, otherwise an alternative parity-check 
matrix from Aut(C) is randomly chosen and the SAD is 
applied again. In the worst case this procedure is repeated N 
times where N denotes a predetermined constant. A similar 
approach is also used to improve ALPD in Algorithm 2 of 
f5T\. Yufit et al. enhance Algorithm 1 with two techniques 
to improve the error-correcting performance and complexity. 
The first technique, called parity-check matrix adaptation, is 
to alter the parity-check matrix prior to decoding such that 
at the columns of the parity-check matrix which correspond 
to least reliable bits, i. e., bits with the smallest absolute LLR 
values, unit vectors are obtained. The second technique, which 
is motivated by MALPD of [[45] , is to drop the inactive 
inequalities at each iteration of SAD, in order to avoid that the 



14 



problem size increases from iteration to iteration. Under the 
BIAWGNC, it is demonstrated on the (63,36, 11) BCH code 
and the (63, 39, 9) BCH code that SAD can be improved both 
in terms of error-correcting performance and computational 
complexity. 

B. Facet guessing approaches 

Based on BLPD2, Dimakis et al. ||28l improve the error- 
correcting performance of BLPD with an approach similar 
to FDA (see Section lIVI i. They introduce facet guessing 
algorithms which iteratively solve a sequence of related LP 
problems. Let x* be a non-integral optimal solution of BLPD, 
x^^ be the ML codeword, and 7^ be a set of faces of V which 
do not contain x* . This set F is given by the set of inequalities 
which are not active at x* . 

The set of active inequalities of a pseudocodeword v is 
denoted by A(u). In facet guessing algorithms, the objective 
function A^.t is minimized over f DP for all / G /C C J^ 
where /C is an arbitrary subset of T. The optimal solutions are 
stored in a list. In random facet guessing decoding (RFGD), 
/C| of the faces f ^ F are chosen randomly. If /C = J-" 
then exhaustive facet guessing decoding (EFGD) is obtained. 
From the list of optimal solutions, the facet guessing algo- 
rithms output the integer solution with minimum objective 
function value. It is shown that EFGD fails if there exists 
a pseudocodeword v £ f such that }^v < X'^x^^ for 
all / G A(x^^). For suitable expander codes this result is 
combined with the following structural property of expander- 
based codes also proven by the authors. The number of 
active inequalities at some codeword is much higher than 
at a non-integral pseudocodeword. Consequently, theoretical 
bounds on the decoding success conditions of the polynomial 
time algorithms EFGD and RFGD for expander codes are 
derived. The numerical experiments are performed under the 
BIAWGNC, on Tanner's (155,64) group-structured LDPC 
code and on a random LDPC code with n ~ 200, dy = 3, 
dc — 4. For these codes the RFG algorithm performs better 
than the SPAD. 

C. Branch & bound approaches 

Linear programming based branch & bound is an implicit 
enumeration technique in which a difficult optimization prob- 
lem is divided into multiple, but easier subproblems by fixing 
the values of certain discrete variables. We refer to |7| for a 
detailed description. Several authors improved LPD using the 
branch & bound approach. 

Breitbach et al. ^ solved IPD by a branch & bound 
approach. Depth-first and breadth-first search techniques are 
suggested for exploring the search tree. The authors point 
out the necessity of finding good bounds in the branch & 
bound algorithm and suggest a neighborhood search heuristic 
as a means of computing upper bounds. In the heuristic, a 
formulation is used which is slightly different to IPD. We 
refer to this formulation as alternative integer programming 
decoding (AIPD). AIPD can be obtained by using error 
vectors. Let y = | (1 — sign(A)) be the hard decision for 
the LLR vector A obtained from the BIAWGNC. Comparing 



y e {0, 1}" with a codeword a; G C results in an error vector 
e G {0, 1}", i.e., e — y + x (mod 2). Let s — Hy, and define 
A by Aj — \Xi\. IPD can be reformulated as 

minA^e (AIPD) 
s.t. He — 2z = s 

eG{0,l}",zGZ". 

In the neighborhood search heuristic of f9], first a feasible 
starting solution e'^ is calculated by setting the coordinates of 
e° corresponding to the n — m most reliable bits (i. e., those 
j € J such that \yj\ are largest) to 0. These are the non-basic 
variables while the m basic variables are found from the vector 
s G {0, 1}™. Starting from this solution a neighborhood search 
is performed by exchanging basic and non-basic variables. The 
tuple of variables yielding a locally best improvement in the 
objective function is selected for iterating to the next feasible 
solution. 

In ||9j, numerical experiments are performed under the 
BIAWGNC, on the (31,21,5) BCH code, the (64,42,8) 
Reed-Muller code, the (127,85,13) BCH code and the 
(255, 173, 23) BCH code. The neighborhood search with sin- 
gle position exchanges performs very similar to MLD for 
the (31,21,5) BCH code. As the block length increases the 
error-correcting performance of the neighborhood search with 
single position exchanges gets worse. An extension of this 
heuristic allowing two position exchanges is applied to the 
(64,42,8) Reed-Muller code, the (127,85,13) BCH code, 
and the (255, 173, 23) BCH code. The extended neighborhood 
search heuristic improves the error-correcting performance at 
the cost of increased complexity. A branch & bound algorithm 
is simulated on the (31, 21, 5) BCH code and different search 
tree exploration schemes are investigated. The authors suggest 
a combination of depth-first and breadth-first search. 

In ll58l . Draper et al. improve the ALPD approach of 
[44| with a branch & bound technique. Branching is done 



on the least certain variable, i.e.. 



such that 



0.5 is 



smallest for j G J. Under the BSC, it is observed on Tanner's 
(155,64,20) code that the ML codeword is found after few 
iterations in many cases. 

In ll36l two branch & bound approaches for LDPC codes 
are introduced. In ordered constant depth decoding (OCDD) 
and ordered variable depth decoding (OVDD), first BLPDl 
is solved. If the optimal solution x* is non-integral, a subset 
T C f of the set of all non-integral bits E is chosen. Let 
g = \T\. The subset T is constituted from the least certain 
bits. The term "ordered" in OCDD and OVDD is motivated 
by this construction. It is experimentally shown in |36| that 
choosing the least certain bits is advantageous in comparison 
to a random choice of bits. OVDD is a breadth first branch & 
bound algorithm where the depth of the search tree is restricted 
to g. Since this approach is common in integer programming, 
we do not give the details of OVDD and refer to Q instead. 
For OVDD, the number of LPs solved in the worst case is 
29+1 _ I 

In OCDD, m-element subsets M of T, i.e., Al C T and 
m^\M\, are chosen. Let b G {0, 1}'". For any MQT, 2™ 
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LPs are solved, each time adding a constraint block 

Xk — bk for al\ k e M 

to BLPDl, thus fixing m bits. Let x be the solution with the 
minimum objective function value among the 2™ LPs solved. 
If X is integral, OCDD outputs x; otherwise another subset 
A^ C 7" is chosen. Since OCDD exhausts all m-element 
subsets of 7", in the worst case (,fj) 2™ + 1 LPs are solved. 

The branch & bound based improved LPD of Yang et al. 
||36| can be applied to LDPC codes with short block length. 
For the following numerical tests, the BIAWGNC is used. Un- 
der various settings of m and g it is shown on a random LDPC 
code with n = 60, i? = i, 4 = 4, and d^ = 3 that OCDD has 
a better error-correcting performance than BLPD and SPAD. 
Several simulations are done to analyze the trade-off between 
complexity and error-correcting performance of OCDD and 
OVDD. For the test instances and parameter setting^ used in 
ll36l it has been observed on the above-mentioned code that 
OVDD outperforms OCDD. This behavior is explained by the 
observation that OVDD applies the branch & bound approach 
on the most unreliable bits. On a longer random LDPC code 
with n = 1024, R = j, dc — ^, and dy = 3, it is demonstrated 
that the OVDD performs better than BLPD and SPAD. 

Another improved LPD technique which can be interpreted 
as a branch & bound approach is randomized bit guessing 
decoding (RBGD) of Dimakis et al. |28|. RBGD is inspired 
from the special case that all facets chosen by RFGD (see 
Section IVII-Bl i correspond to constraints of type Xj > or 
Xj < 1. In RBGD, k — clogn variables, where c > is a 
constant, are chosen randomly. Because there are 2*^ different 
possibile configurations of these k variables, BLPD2 is run 
2'^ times with associated constraints for each assignment. The 
best integer valued solution in terms of the objective function 
A is the output of RBGD. Note that by setting k to clogn, a 
polynomial complexity in n is ensured. Under the assumption 
that there exists a unique ML codeword, exactly one of 
the 2''' bit settings matches the bit configuration in the ML 
codeword. Thus, RBGD fails if a non-integral pseudocodeword 
with a better objective function value coincides with the ML 
codeword in all k components. For some expander codes, the 
probabHlty that the RBGD finds the ML codeword is given 
in ||281 . To find this probability expression, the authors first 
prove that, for some expander-based codes, the number of non- 
integral components in any pseudocodeword scales linearly in 
block length. 

Chertkov and Chernyak f59l apply the loop calculus ap- 
proach I.60J , 1.6 IJ to improve BLPD. Loop calculus is an 
approach from statistical physics and related to cycles in 
the Tanner graph representation of a code. In the context of 
improved LPD, it is used to either modify objective function 
coefficients ||59l or to find branching rules for branch and 
bound (6T\. Given a parity-check matrix and a channel output, 
linear programming erasure decoding (LPED) 1591 first solves 
BLPD. If a codeword is found then the algorithm terminates. If 
a non-integral pseudocodeword is found then a so-called crit- 
ical loop is searched by employing loop calculus. The indices 

'The parameters m and g are chosen such that OVDD and OCDD have 
similar worst case complexity. 



of the variable nodes along the critical loop form an index set 
A4 C J. LPED lowers the objective function coefficients Xj 
of the variables Xj, j G A4, by multiplying Xj with e, where 
< e < 1. After updating the objective function coefficients, 
BLPD is solved again. If BLPD does not find a codeword 
then the selection criterion for the critical loop is improved. 
LPED is tested on the list of pseudocodewords found in ll35l 
for Tanner's (155, 64, 20) code. It is demonstrated that LPED 
corrects the decoding errors of BLPD for this code. 

In 1621 . Chertkov combines the loop calculus approach used 
in LPED f59| with RFGD f28l. We refer to the combined 
algorithm as loop guided guessing decoding (LGGD). LGGD 
differs from RFGD in the sense that the constraints chosen are 
of type Xj > or Xj < 1 where j is in the index set M, the 
index set of the variable nodes in the critical loop. LGGD 
starts with solving BLPD. If the optimal solution is non- 
integral then the critical loop is found with the loop calculus 
approach. Next, a variable Xj, j e M, is selected randomly 
and two partial LPD problems are deduced. These differ 
from the original problem by only one equality constraint 
Xj = or Xj = 1. LGGD chooses the minimum of the 
objective values of the two subproblems. If the corresponding 
pseudocodeword is integral then the algorithm terminates. 
Otherwise the equality constraints are dropped, a new j E M 
along the critical loop is chosen, and two new subproblems are 
constructed. If the set M is exhausted, the selection criterion of 
the critical loop is improved. LGGD is very similar to OCDD 
of [361 for the case that g = |A/| and m = 1. In LGGD 
branching is done on the bits in the critical loop whereas in 
OCDD branching is done on the least reliable bits. As in |59|, 
LGGD is tested on the list of pseudocodewords generated in 
f35l for Tanner's (155, 64, 20) code. It is shown that LGGD 
improves BLPD under the BIAWGNC. 

SAD of 1 55 1 is improved in terms of error-correcting 
performance by a branch & bound approach in fST]. In 
Algorithm 3 of ifSTl . first SAD is employed. If the solution 
is non-integral then a depth-first branch & bound is applied. 
The non-integral valued variable with smallest LLR value is 
chosen as the branching variable. Algorithm 3 terminates as 
soon as the search tree reaches the maximally allowed depth 
Dp. Under the BIAWGNC, on the (63, 36, 11) BCH code and 
the (63, 39, 9) BCH code Yufit et al. [57] demonstrate that the 
decoding performance of Algorithm 3 (enhanced with parity- 
check matrix adaptation) approaches MLD. 

VIII. Conclusion 

In this survey we have shown how the decoding of binary 
linear block codes benefits from a wide range of concepts 
which originate from mathematical optimization — mostly li- 
near programming, but also quadratic (nonlinear) and inte- 
ger programming, duality theory, branch & bound methods, 
Lagrangian relaxation, network flows, and matroid theory. 
Bringing together both fields of research does lead to promis- 
ing new algorithmic decoding approaches as well as deeper 
structural understanding of linear block codes in general and 
special classes of codes — like LDPC and turbo-like codes — in 
particular The most important reason for the success of this 
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connection is the formulation of MLD as the minimization of 
a hnear function over the codeword polytope conv(C). We 
have reviewed a variety of techniques of how to approximate 
this polytope, whose description complexity in general is too 
large to be computed efficiently. 

For further research on LPD of binary linear codes, two 
general directions can be distinguished. One is to decrease 
the algorithmic complexity of LPD towards reducing the gap 
between LPD and IMPD, the latter of which still outperforms 
LPD in practice. The other direction aims at increasing error- 
correcting performance, tightening up to MLD performance. 
This includes a continued study of RPCs as well as the 
characterization of other, non-RPC facet-defining inequalities 
of the codeword polytope. 

There are other lines of research related to LPD and IMPD 
which are not covered in this article. Hanagan et al. ETI have 
generalized LP decoding, along with several related concepts, 
to nonbinary linear codes. Another possible generalization 
is to extend to different channel models ll22l . Connecting 
two seemingly different decoding approaches, structural re- 
lationship between LPD and IMPD has been discussed in 
||63l . Moreover, the discovery that both decoding methods are 
closely related to the Bethe free energy approximation, a tool 
from statistical physics, has initiated vital research 164|. Also, 
of course, research on IMPD itself, independent of LPD, is still 
ongoing with high activity. A promising direction of research 
is certainly the application of message passing techniques to 
mathematical programming problems beyond LPD 1651 . 
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